AITopics | safety threshold

Collaborating Authors

safety threshold

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

Neural Information Processing SystemsMay-1-2026, 05:56:07 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

fdb11be1acf5e3724737dd585e590146-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 19:51:43 GMT

demonstration, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

BIM-Discrepancy-Driven Active Sensing for Risk-Aware UAV-UGV Navigation

Mojtahedi, Hesam, Akhavian, Reza

arXiv.org Artificial IntelligenceNov-19-2025

This paper presents a BIM-discrepancy-driven active sensing framework for cooperative navigation between unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) in dynamic construction environments. Traditional navigation approaches rely on static Building Information Modeling (BIM) priors or limited onboard perception. In contrast, our framework continuously fuses real-time LiDAR data from aerial and ground robots with BIM priors to maintain an evolving 2D occupancy map. We quantify navigation safety through a unified corridor-risk metric integrating occupancy uncertainty, BIM-map discrepancy, and clearance. When risk exceeds safety thresholds, the UAV autonomously re-scans affected regions to reduce uncertainty and enable safe replanning. Compared to frontier-based exploration, our approach achieves similar uncertainty reduction in half the mission time. These results demonstrate that integrating BIM priors with risk-adaptive aerial sensing enables scalable, uncertainty-aware autonomy for construction robotics. Introduction Construction sites are among the most dynamic, unstructured, and safety-critical environments for autonomous robots. Unlike factory floors or structured indoor spaces, these environments are marked by continual change. New buildings are erected, materials are relocated, and the movement of heavy machinery and workers can be unpredictable. Such conditions make autonomous navigation particularly challenging. Construction 4.0 [1], emphasizing automation and digitalization, is moving robotics from trial phases to regular use on construction sites.

artificial intelligence, exploration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.14037

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.67)
Construction & Engineering (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 22:29:30 GMT

In the standard reinforcement learning (RL) setting, the primary goal is to obtain a policy that maximizes a cumulative scalar reward [Sutton and Barto, 2018].

dataset, demonstration, target preference, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sandbagging in a Simple Survival Bandit Problem

Dyer, Joel, Ornia, Daniel Jarne, Bishop, Nicholas, Calinescu, Anisoara, Wooldridge, Michael

arXiv.org Machine LearningOct-1-2025

Evaluating the safety of frontier AI systems is an increasingly important concern, helping to measure the capabilities of such models and identify risks before deployment. However, it has been recognised that if AI agents are aware that they are being evaluated, such agents may deliberately hide dangerous capabilities or intentionally demonstrate suboptimal performance in safety-related tasks in order to be released and to avoid being deactivated or retrained. Such strategic deception - often known as "sandbagging" - threatens to undermine the integrity of safety evaluations. For this reason, it is of value to identify methods that enable us to distinguish behavioural patterns that demonstrate a true lack of capability from behavioural patterns that are consistent with sandbagging. In this paper, we develop a simple model of strategic deception in sequential decision-making tasks, inspired by the recently developed survival bandit framework. We demonstrate theoretically that this problem induces sandbagging behaviour in optimal rational agents, and construct a statistical test to distinguish between sandbagging and incompetence from sequences of test scores. In simulation experiments, we investigate the reliability of this test in allowing us to distinguish between such behaviours in bandit models. This work aims to establish a potential avenue for developing robust statistical procedures for use in the science of frontier model evaluations.

agent, evaluation period, surv, (15 more...)

arXiv.org Machine Learning

2509.26239

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Adaptive Evolutionary Framework for Safe, Efficient, and Cooperative Autonomous Vehicle Interactions

Tian, Zhen, Lin, Zhihao

arXiv.org Artificial IntelligenceSep-10-2025

Modern transportation systems face significant challenges in ensuring road safety, given serious injuries caused by road accidents. The rapid growth of autonomous vehicles (AVs) has prompted new traffic designs that aim to optimize interactions among AVs. However, effective interactions between AVs remains challenging due to the absence of centralized control. Besides, there is a need for balancing multiple factors, including passenger demands and overall traffic efficiency. Traditional rule-based, optimization-based, and game-theoretic approaches each have limitations in addressing these challenges. Rule-based methods struggle with adaptability and generalization in complex scenarios, while optimization-based methods often require high computational resources. Game-theoretic approaches, such as Stackelberg and Nash games, suffer from limited adaptability and potential inefficiencies in cooperative settings. This paper proposes an Evolutionary Game Theory (EGT)-based framework for AV interactions that overcomes these limitations by utilizing a decentralized and adaptive strategy evolution mechanism. A causal evaluation module (CEGT) is introduced to optimize the evolutionary rate, balancing mutation and evolution by learning from historical interactions. Simulation results demonstrate the proposed CEGT outperforms EGT and popular benchmark games in terms of lower collision rates, improved safety distances, higher speeds, and overall better performance compared to Nash and Stackelberg games across diverse scenarios and parameter settings.

artificial intelligence, machine learning, vehicle, (19 more...)

arXiv.org Artificial Intelligence

2509.07411

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.70)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Transportation > Passenger (0.88)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Safe Planning and Policy Optimization via World Model Learning

Latyshev, Artem, Gorbov, Gregory, Panov, Aleksandr I.

arXiv.org Artificial IntelligenceJun-6-2025

Reinforcement Learning (RL) applications in real-world scenarios must prioritize safety and reliability, which impose strict constraints on agent behavior. Model-based RL leverages predictive world models for action planning and policy optimization, but inherent model inaccuracies can lead to catastrophic failures in safety-critical settings. We propose a novel model-based RL framework that jointly optimizes task performance and safety. To address world model errors, our method incorporates an adaptive mechanism that dynamically switches between model-based planning and direct policy execution. We resolve the objective mismatch problem of traditional model-based approaches using an implicit world model. Furthermore, our framework employs dynamic safety thresholds that adapt to the agent's evolving capabilities, consistently selecting actions that surpass safe policy suggestions in both performance and safety. Experiments demonstrate significant improvements over non-adaptive methods, showing that our approach optimizes safety and performance simultaneously rather than merely meeting minimum safety requirements. The proposed framework achieves robust performance on diverse safety-critical continuous control tasks, outperforming existing methods.

machine learning, reinforcement learning, world model, (17 more...)

arXiv.org Artificial Intelligence

2506.04828

Country: Europe > Russia (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents

Tomilin, Tristan, Fang, Meng, Pechenizkiy, Mykola

arXiv.org Artificial IntelligenceMar-11-2025

Advancing safe autonomous systems through reinforcement learning (RL) requires robust benchmarks to evaluate performance, analyze methods, and assess agent competencies. Humans primarily rely on embodied visual perception to safely navigate and interact with their surroundings, making it a valuable capability for RL agents. However, existing vision-based 3D benchmarks only consider simple navigation tasks. To address this shortcoming, we introduce \textbf{HASARD}, a suite of diverse and complex tasks to $\textbf{HA}$rness $\textbf{SA}$fe $\textbf{R}$L with $\textbf{D}$oom, requiring strategic decision-making, comprehending spatial relationships, and predicting the short-term future. HASARD features three difficulty levels and two action spaces. An empirical evaluation of popular baseline methods demonstrates the benchmark's complexity, unique challenges, and reward-cost trade-offs. Visualizing agent navigation during training with top-down heatmaps provides insight into a method's learning process. Incrementally training across difficulty levels offers an implicit learning curriculum. HASARD is the first safe RL benchmark to exclusively target egocentric vision-based learning, offering a cost-effective and insightful way to explore the potential and boundaries of current and future safe RL methods. The environments and baseline implementations are open-sourced at https://sites.google.com/view/hasard-bench/.

agent, conference paper, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2503.08241

Country:

Europe > United Kingdom > England > North Yorkshire > York (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Italy > Lazio > Rome (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Auction-Based Regulation for Artificial Intelligence

Bornstein, Marco, Che, Zora, Julapalli, Suhas, Mohamed, Abdirisak, Bedi, Amrit Singh, Huang, Furong

arXiv.org Artificial IntelligenceOct-2-2024

In an era of "moving fast and breaking things", regulators have moved slowly to pick up the safety, bias, and legal pieces left in the wake of broken Artificial Intelligence (AI) deployment. Since AI models, such as large language models, are able to push misinformation and stoke division within our society, it is imperative for regulators to employ a framework that mitigates these dangers and ensures user safety. While there is much-warranted discussion about how to address the safety, bias, and legal woes of state-of-the-art AI models, the number of rigorous and realistic mathematical frameworks to regulate AI safety is lacking. We take on this challenge, proposing an auction-based regulatory mechanism that provably incentivizes model-building agents (i) to deploy safer models and (ii) to participate in the regulation process. We provably guarantee, via derived Nash Equilibria, that each participating agent's best strategy is to submit a model safer than a prescribed minimum-safety threshold. Empirical results show that our regulatory auction boosts safety and participation rates by 20% and 15% respectively, outperforming simple regulatory frameworks that merely enforce minimum safety standards.

agent, auction, regulator, (16 more...)

arXiv.org Artificial Intelligence

2410.01871

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > California (0.14)
North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Law > Statutes (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.93)
Media > News (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

Lin, Qian, Liu, Zongkai, Mo, Danying, Yu, Chao

arXiv.org Artificial IntelligenceSep-15-2024

In recent years, significant progress has been made in multi-objective reinforcement learning (RL) research, which aims to balance multiple objectives by incorporating preferences for each objective. In most existing studies, specific preferences must be provided during deployment to indicate the desired policies explicitly. However, designing these preferences depends heavily on human prior knowledge, which is typically obtained through extensive observation of high-performing demonstrations with expected behaviors. In this work, we propose a simple yet effective offline adaptation framework for multi-objective RL problems without assuming handcrafted target preferences, but only given several demonstrations to implicitly indicate the preferences of expected policies. Additionally, we demonstrate that our framework can naturally be extended to meet constraints on safety-critical objectives by utilizing safe demonstrations, even when the safety thresholds are unknown. Empirical results on offline multi-objective and safe tasks demonstrate the capability of our framework to infer policies that align with real preferences while meeting the constraints implied by the provided demonstrations.

demonstration, safety threshold, target preference, (13 more...)

arXiv.org Artificial Intelligence

2409.09958

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.81)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback